Structuration and Enrichment of HTML Documents in order to Build a Specific Information Warehouse

نویسندگان

  • Josiane Mothe
  • Franck Ravat
  • Farshad Riahi
  • Gilles Zurfluh
چکیده

This paper presents a process to enrich the web document representation in order to supply an information warehouse and allow more precise queries than the web search engines do. This information warehouse is stored in an objectoriented database (OODB) so that powerful set-based query languages can be used. One of the main contributions of the paper is the HTML document enrichment while supplying the warehouse. This enrichment is based on the document decomposition and on the components indexing. These processes take into account the logical and the hyperlinking structures as well as the appearance of the Web documents. A prototype has been developed using the OODBMS O2.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Solution to View Management to Build a Data Warehouse

Several techniques exist to select and materialize a proper set of data in a suitable structure that manage the queries submitted to the online analytical processing systems. These techniques are called view management techniques, which consist of three research areas: 1) view selection to materialize, 2) query processing and rewriting using the materialized views, and 3) maintaining materializ...

متن کامل

Collaboration Between Researchers and Knowledge Users in Health Technology Assessment: A Qualitative Exploratory Study

Background Collaboration between researchers and knowledge users is increasingly promoted because it could enhance more evidence-based decision-making and practice. These complex relationships differ in form, in the particular goals they are trying to achieve, and in whom they bring together. Although much is understood about why partnerships form, relatively little is known about how collabora...

متن کامل

Web Data Warehousing Convergence: From Schematic to Systematic

AbstRAct This paper proposes a data warehouse integration technique that combines data and documents from different underlying documents and database design approaches. The well-defined and structured data such as Relational, Object-oriented and Object Relational data, semi-structured data such as XML, and unstructured data such as HTML documents are integrated into a Web data warehouse system....

متن کامل

Integrated Order Batching and Distribution Scheduling in a Single-block Order Picking Warehouse Considering S-Shape Routing Policy

In this paper, a mixed-integer linear programming model is proposed to integrate batch picking and distribution scheduling problems in order to optimize them simultaneously in an order picking warehouse. A tow-phase heuristic algorithm is presented to solve it in reasonable time. The first phase uses a genetic algorithm to evaluate and select permutations of the given set of customers. The seco...

متن کامل

Document Warehousing Based on a Multimedia Database System

Nowadays, structured data such as sales and business forms are stored in data warehouses for decision makers to use. Further, unstructured data such as emails, html texts, images, videos, and oftIce documents are increasingly accumulated in personal computer storage due to spread of mailing, Www, and word processing. Such unstructured data, or what we call multimedia documents, are larger in vo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000